{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# eICU Collaborative Research Database\n",
    "\n",
    "# Notebook 1: Exploring the `patient` table\n",
    "\n",
    "The aim of this notebook is to introduce the `patient` table, a key table in the eICU Collaborative Research Database. \n",
    "\n",
    "The `patient` table contains patient demographics and admission and discharge details for hospital and ICU stays. For more detail, see: http://eicu-crd.mit.edu/eicutables/patient/\n",
    "\n",
    "Before starting, you will need to copy the eicu demo database file ('eicu_demo.sqlite3') to the `data` directory.\n",
    "\n",
    "Documentation on the eICU Collaborative Research Database can be found at: http://eicu-crd.mit.edu/. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Getting set up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import psycopg2\n",
    "import os\n",
    "import sqlite3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Plot settings\n",
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "fontsize = 20 # size for x and y ticks\n",
    "plt.rcParams['legend.fontsize'] = fontsize\n",
    "plt.rcParams.update({'font.size': fontsize})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Connect to the database - which is assumed to be in the current directory\n",
    "fn = 'eicu_demo.sqlite3'\n",
    "con = sqlite3.connect(fn)\n",
    "cur = con.cursor()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Display list of tables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "query = \\\n",
    "\"\"\"\n",
    "SELECT type, name\n",
    "FROM sqlite_master \n",
    "WHERE type='table'\n",
    "ORDER BY name;\n",
    "\"\"\"\n",
    "\n",
    "list_of_tables = pd.read_sql_query(query,con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "list_of_tables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Reviewing the patient table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# query to load data from the patient table\n",
    "query = \\\n",
    "\"\"\"\n",
    "SELECT *\n",
    "FROM patient\n",
    "\"\"\"\n",
    "\n",
    "print(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# run the query and assign the output to a variable\n",
    "patient_tab = pd.read_sql_query(query,con)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# display the first few rows of the dataframe\n",
    "patient_tab.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# list all of the columns in the table\n",
    "patient_tab.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Questions\n",
    "\n",
    "- What does `patientunitstayid` represent? (hint, see: http://eicu-crd.mit.edu/eicutables/patient/)\n",
    "- What does `patienthealthsystemstayid` represent?\n",
    "- What does `uniquepid` represent?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# select a limited number of columns to view\n",
    "columns = ['uniquepid', 'patientunitstayid','gender','age','unitdischargestatus']\n",
    "patient_tab[columns].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# what are the unique values for age?\n",
    "age_col = 'age'\n",
    "patient_tab[age_col].sort_values().unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Questions\n",
    "\n",
    "- Try plotting a histogram of ages using the commands in the cell below. Why does the plot fail?\n",
    "\n",
    "```python\n",
    "# try plotting a histogram of ages\n",
    "figsize = (18,8)\n",
    "patient_tab[age_col].plot(kind='hist',\n",
    "                          figsize=figsize, \n",
    "                          fontsize=fontsize,\n",
    "                          bins=15)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# create a column containing numerical ages\n",
    "# If ‘coerce’, then invalid parsing will be set as NaN\n",
    "agenum_col = 'age_num'\n",
    "patient_tab[agenum_col] = pd.to_numeric(patient_tab[age_col], errors='coerce')\n",
    "patient_tab[agenum_col].sort_values().unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# try plotting a histogram of ages\n",
    "figsize = (18,8)\n",
    "patient_tab[agenum_col].plot(kind='hist',\n",
    "                             figsize=figsize, \n",
    "                             fontsize=fontsize,\n",
    "                             bins=15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Questions\n",
    "\n",
    "- Use the `mean()` method to find the mean age (hint: `patient_tab[agenum_col].mean()`). What is the mean? Why might we expect this to be lower than the true mean?\n",
    "- In the same way that you use `.mean()`, you can use `.describe()`. Use the `describe()` method to explore the `admissionweight` of patients in kg. What issue do you see? What are some methods that you could use to deal with this issue?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# set threshold based on 99th quantile\n",
    "adweight_col = 'admissionweight'\n",
    "quant = patient_tab[adweight_col].quantile(0.99)\n",
    "patient_tab[patient_tab[adweight_col] > quant] = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# describe the admission weights\n",
    "patient_tab[adweight_col].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Questions\n",
    "\n",
    "- What is the average change in weight between `admissionweight` and `dischargeweight`?\n",
    "- Plot a distribution of the weight change"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# set threshold based on 99th quantile\n",
    "disweight_col = 'dischargeweight'\n",
    "quant = patient_tab[disweight_col].quantile(0.99)\n",
    "patient_tab[patient_tab[disweight_col] > quant] = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# describe the discharge weights\n",
    "patient_tab[disweight_col].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "patient_tab['weight_change'] = patient_tab[adweight_col] - patient_tab[disweight_col]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# plot the weight changes\n",
    "figsize = (18,8)\n",
    "patient_tab['weight_change'].plot(kind='hist',\n",
    "                             figsize=figsize, \n",
    "                             fontsize=fontsize,\n",
    "                             bins=50)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
